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ABSTRACT 


Military aircraft maintenance methods are moving from practices based on hard¬ 
time inspection and replacement intervals to one of Condition Based Maintenance 
(CBM). CBM allows the ability to forego scheduled maintenance on components or 
systems that are not in need of maintenance or replacement. CBM reduces maintenance 
efforts and component replacement and increases readiness and safety. 

Goodrich Corporation has developed the Integrated Mechanical Diagnostics 
Health and Usage Management System (IMD-HUMS) to support CBM in helicopters. 
Great benefits in several maintenance practices, readiness and safety have already been 
realized by the UH-60L helicopter military unit equipped with the IMD-HUMS system. 

The total potential of the system, for the components observed by the IMD- 
HUMS, however, has not yet been achieved. The IMD-HUMS gathers an enormous 
amount of data on the condition of these components and systems. The meaning and full 
potential of all this data has not yet been fully realized because to date, this data has never 
been coupled with corresponding maintenance data. 

The purpose of this research is to conduct and document statistical analysis of 
IMD-HUMS produced data with corresponding maintenance data of observed component 
failures. Statistical applications of logistic regression and classification trees are explored 
to predict failures. The approaches used in the exploration of the IMD-HUMS 
acquisition data sets are based on sixty electrical generators from thirty aircraft, six of 
which displayed degradation or failure and hence required maintenance actions. This 
approach is promising. With it we accurately predict two previously undocumented 
failures. 
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EXECUTIVE SUMMARY 


Military aircraft maintenance methods are moving from practices based on hard¬ 
time inspection and replacement intervals to one of Condition Based Maintenance 
(CBM). The latter practice allows the ability to forego scheduled maintenance on 
components or systems which have reached their high times but are not in need of 
maintenance or replacement. Benefits of CBM are the minimization of maintenance 
efforts and component replacement along with an increase in readiness and safety. 

Goodrich Corporation has developed the Integrated Mechanical Diagnostics 
Health and Usage Management System (IMD-HUMS) for the practices of CBM in 
helicopters. Great benefits have already been realized by the using UH-60L helicopter 
military unit with the IMD-HUMS system in regards to several maintenance practices, 
readiness, and safety. 

The total potential of the system, in regards to these benefits for the multiple 
components observed by the IMD-HUMS, however, is not yet achieved. The IMD- 
HUMS gathers a great deal of pertinent, important data on the condition of multiple 
components and systems, but the meaning and full potential of all this data is not yet fully 
realized. 

The purpose of this research is to conduct and document the statistical analysis of 
IMD-HUMS produced data. Statistical applications of logistic regression and random 
forest of classification trees are explored. The approaches used in the exploration of the 
IMD-HUMS acquisition data sets are based on six electrical generators which displayed 
degradation or failure—and hence required maintenance actions—compared with sixty 
others which did not. This thesis focuses on using the combination of resulting vibratory 
patterns and maintenance records from one type of component, the electrical generator of 
the UH-60L helicopter, to forecast the need for maintenance. Data acquired from the 
IMD-HUMS will be used in an attempt to understand and predict health predictions of 
the UH-60L electrical generator, and in hopes of gaining insights in developing 
component health predictions from IMD-HUMS data for other components. 



This thesis discusses how the resulting predicted health classifications compare to 
how each of the generators are currently classified. In this process, some surprising cases 
of generator health classification are uncovered. One generator, which was wrongly 
presumed to be bad and, similarly, another generator, which was wrongly assumed to be 
good, were predicted correctly by this study's classification scheme. The thesis 
demonstrates that two different models—logistic regression and random forest of 
classification trees—can be fit using IMD-HUMS data collected with known cases of 
failed generators and properly operating generators. These models can predict the overall 
state of a UH-60L electrical generator. 


xviii 



I. 


INTRODUCTION 


There are over 12,000 aircraft in the U.S. military's inventory, with nearly 2,400 
in the Navy and Marine Corps (International Institute for Strategic Studies, 2005). In 
Fiscal Year 2005, Congress obligated over 5.29 billion dollars toward the operation and 
maintenance of these Naval aircraft, with 1.08 billion dollars of this money obligated 
strictly to intermediate and depot-level maintenance (Office of the Undersecretary of 
Defense, 2005). 

To put these operation and maintenance costs into perspective, consider that 
flying a single CH-53E helicopter for one flight hour costs $14,000 and requires 44 
maintenance man-hours (http://www.aviationtoday.com, Nov 2005). A solution to these 
high costs may be found through the services' concerted efforts to move away from a 
scheduled maintenance approach toward a combination of scheduled and condition based 
maintenance (CBM). In fact, this has been mandated as the Department of Defense 
(DoD) required strategy to improve aircraft supportability (DoD Instruction 5000.2, May 
2003). For helicopters, this means monitoring the conditions of the mechanical 
components, which account for 70% of maintenance costs (Ruben & Rossi, 2003). 

Monitoring of these components is best accomplished through the collection of 
these components' vibratory patterns. Goodrich Corporation has developed the Integrated 
Mechanical Diagnostics Health and Usage Management System (IMD-HUMS) which 
collects and analyzes a helicopter component's vibrations for use in CBM. The system 
has been installed and operational for over two years in 30 U.S. Army UH-60L 
helicopters. This provides the opportunity, for the first time, to investigate data produced 
by IMD-HUMS installed in a large fleet of operational helicopters, rather than data from 
test stand mounted fault-induced components or test-bed aircraft. 

The IMD-HUMS is worthy of study because major economic, operational and 
safety benefits can be realized by incorporating such CBM systems into aircraft 
maintenance practices. This thesis focuses on using the combination of resulting 
vibratory patterns and maintenance records from one type of component, the electrical 
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generator, to forecast the need for maintenance. The data is explored and analyzed using 
statistical approaches in hopes of gaining insights in developing component health 
predictions from IMD-HUMS data. 


A. LITERATURE REVIEW 

Numerous papers describe the IMD-HUMS; however, very little work concerns 
specific analysis of operational helicopter vibratory patterns, and even less focuses on 
relating changes in the vibratory patterns to actual operational maintenance events. 

The "Systems Users Manual for IMD-HUMS" (U.S. Army Publication, 2005) and 
the "P^I VPU/DTD Software Requirements Specifications" (Goodrich Publication, 2001) 
provide the basic terminology, concept of operations, and an explanation of the physical 
measurements regarding the IMD-HUMS. Understanding the physics behind the 
vibratory patterns is essential for predicting component health. 

Various papers and briefs written primarily by employees of Goodrich 
Corporation and the IMD-HUMS Program Managers Office provide an overview of the 
uses and issues concerning the IMD-HUMS. Hess, Duke and Kogut (2005) provide a 
good overview of the development history, terms, functionality, and potential of the 
IMD-HUMS. The master’s thesis by Revor (2004) uses discrete event simulation backed 
by Naval Aviation Logistics Analysis (NALDA) databases to investigate the cost benefits 
of incorporating the IMD-HUMS into helicopter rotor track and balance maintenance 
actions. Revor's simulation supports the idea that using the IMD-HUMS will decrease 
costs and maintenance efforts. 

Several Goodrich papers also discuss the mathematical concepts and algorithmic 
inner workings of the IMD-HUMS in detail. These papers provide insight into the 
complexity and potential of the system; for example, see Bechhoefer and Power (2002) 
and Hochmann (2004). The latter paper addresses the issue of variability among 
vibratory pattern observations which originate from seemingly identical operating 
conditions. 
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The master’s thesis by Elyurek (2003) presents empirical studies of vibratory 
patterns. Elyurek (2003) uses Box-Jenkins time series modeling with regression to 
determine vibration thresholds for gear fault identification. Elyurek's study is based on 
operational data produced by a test IMD-HUMS installed CH-53 helicopter. He 
concludes that his model could not match the required negligible alarm rate due to the 
small sample size available. 

Only recently has it been possible to look at vibratory patterns matched with 
corresponding operational maintenance events. Wright (2005) investigates several cases 
of maintenance discrepancy detections made by the 30 UH-60E helicopters with IMD- 
HUMS installed. In three particular cases, the IMD-HUMS data indicated that the 
generator was about to fail before it actually did. The paper explains the subsequent 
investigation and facts concerning these generators. The apparent relationship between 
changes in vibratory patterns and the failed generators described by Wright provides the 
motivation for choosing UH-60E generators for this study. In addition, the paper 
discusses processes developed to incorporate the IMD-HUMS data into beneficial 
maintenance practices. 

B. RESEARCH FOCUS 

With the exception of Wright's paper there are no published works that 
empirically relate vibratory patterns to documented operational maintenance events. The 
full potential of CBM using IMD-HUMS in particular has not yet been fully realized. 
The objective of CBM is to know, from the data collected by sensor readings, when a 
component or system needs replacement or maintenance. A simple analogy to CBM is 
when a medical doctor observes a person’s temperature, blood pressure and heart rate. 
The readings could mean many different things under different circumstances, but an 
experienced doctor would be able tell if that person is of good health or not, and 
specifically what medical actions to take. Now, imagine the first time in history a doctor 
listened to a heart beat. He knew this information was important and could explain a 
great deal concerning a patient's health, but everything the patient's heart beat can tell the 
doctor was not yet known. This is where we are now with much of the data resulting from 
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the IMD-HUMS. The IMD-HUMS data tells the user something about each monitored 
component’s health and future health, but exactly what it tells is deserving of study. 

This issue is addressed in this thesis. Data acquired from a CBM-based system 
(the IMD-HUMS) will be used in an attempt to understand and predict the state, 
condition and performance of a component (the UH-60L electrical generator). 

The UH-60L electrical generators were chosen for study for two reasons. First, 
during the two years the helicopters were installed with IMD-HUMS there were six 
generators which needed to be removed from operations for some reason of fault, and 
there were 60 generators deemed to be working properly. This provides a data set in 
which generators could be classified as "bad" (removed for some fault) and "good" 
(working properly). Second, the electrical generators are relatively simple components to 
study when compared to aircraft engines or transmissions. The generators have fewer 
moving parts which produce vibrations and are much less likely to be affected by factors 
such as flight regime or torque settings. 

C. APPROACH 

This thesis's approach for assessing the generators' health is somewhat different 
than the current method of health assessment used with the IMD-HUMS. Currently a 
component's overall health assessment is assessed by using a Health Indicator (HI) for 
that component. Each component HI is computed from a subset of IMD-HUMS 
vibratory readings known as Condition Indicators (Cl). A component's HI is a statistic 
which summarizes when the Cl corresponding to that component have unusual values 
compared to the historical distributions of these CL The Cl readings are just from 
specific parts within the component itself. For instance, the generator's health is 
monitored by the HI computed from Cl originating from the generator's shaft. Rather 
than attempt to supplant the current method by using different Cl or by changing how the 
HI are computed from the Cl, the approach used in this thesis augments the current 
method. 

First, to assess generator health a broader set of Cl are used. Not only are Cl 
originating from the generator shaft vibratory patterns used, but Cl from the vibratory 
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patterns of the nearby supporting gear and bearing are also used. Second, for the UH- 
60L generators, there are two years of empirical IMD-HUMS data along with 
corresponding maintenance records for 30 aircraft, each with two generators. This data 
set should be large enough to contain examples of the most common failure modes from 
generators along with their corresponding vibratory patterns. The data set also contains 
examples of healthy generators along with their vibratory patterns. A classification 
scheme is developed based on these examples of good and bad generators and their 
vibratory patterns as measured by the approximately 170 Cl related to the generator shaft, 
bearing and gear. 

The classification scheme uses a logistic regression fit to the data which estimates 
the probability of the generator being bad as a function of the CL This logistic regression 
fit does not explicitly take into account the time series nature of Cl readings. Therefore, 
as a basis for classification, a loess smoother of the probabilities predicted over time by 
the logistic regression is used. To test the predictive ability of the classification scheme, 
the generator data is divided into two sets: a training set and an experimental set. Only 
the training set is used in the logistic regression fit. The classification scheme is then 
tested on the experimental data which contains a bad generator, several good generators, 
and generators of questionable health. 

D. OUTLINE OF STUDY 

Chapter II gives the background needed to understand this study. In particular it 
provides an overview of CBM, IMD-HUMS, the UH-60L helicopter and its electrical 
generators. This chapter also provides fundamental knowledge concerning the Cl and HI 
used in this study. This is important because the data set of flight regimes and vibratory 
patterns for 30 aircraft over two years of operation is very large. It contains both a large 
number of variables and a large number of records. 

Chapter III describes the data set and how it is partitioned into the training and 
experimental sets. The vibratory patterns and flight regime data are also studied for both 
good and bad generators in the training set. This analysis chapter begins with graphical 
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exploration to investigate differences in the training data among the good and bad 
generators, as well as differences among just the bad generators. 

In the second part of the analysis, a parametric model, logistic regression 
(Montgomery, 2001), is fit to the training data. As a check of the estimated probabilities 
of the generator being bad, a nonparametric model, a random forest of classification trees 
(Berk, 2005), is also fit to the training data. These two models give respectively an 
estimated probability of a generator being bad and a classification of a generator being 
bad or good for each acquisition. 

In Chapter IV the logistic regression and random forest classifiers are applied to 
each of the generators used in the study. The probabilities of being predicted bad from 
the logistic regression are plotted over time and then smoothed. These smoothed versions 
are used to classify each generator in the training and experimental data set as good or 
bad. The end of the chapter carefully discusses how these predicted classifications 
compare to how each of the generators are actually classified. In this process, some 
surprising cases of generator health classification are uncovered. One generator which 
was wrongly presumed to be bad and conversely another generator which was wrongly 
assumed to be good were classified correctly by this study's approach. 

Conclusions and recommendations for further study are given in the final chapter. 
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II. BACKGROUND: SYSTEMS AND CONCEPTS 


This section introduces and explains the concepts of scheduled maintenance, 
CBM and vibration analysis. A description and the principles of operation of the IMD- 
HUMS are given because this system acquires, manipulates and stores the data. A brief 
description of the UH-60L helicopter, its electrical generators and supporting components 
is included since they are the source of the studied data. 

A. AIRCRAFT MAINTENANCE CONCEPTS 

There are two very different concepts in the way military aircraft maintenance is 
performed. The first, scheduled maintenance, uses traditional methods based upon time 
of usage. The other is CBM, which is heavily dependant upon vibration monitoring and 
diagnostics. 


I. Scheduled Maintenance 

Currently, the maintenance upon most military aircraft is performed under the 
concept of scheduled maintenance or the idea of Time Before Overhaul (TBO). One of 
two cases occurs which result in a required maintenance action. A component or system 
noticeably fails, or is operating in a noticeably degraded mode in which case it is replaced 
or fixed; or the component or system reaches a pre-determined amount of usage at which 
time it is replaced or inspected. The inspections or replacements are based upon set hard- 
times of usage. For examples, there may be a requirement for a phase inspection after 
100 hours of pilot logged flight time, transmission and engine replacement after a specific 
number of flight hours, jet engine power tests after a designated number of usage hours, 
or replacement of the tail-hook on a carrier-based aircraft after a specific number of traps. 
The number of flight hours or usage until required maintenance is determined by design 
engineers based upon the probability of when the component is most likely to fail and the 
severity of the consequences of its failure. These usage intervals are historically and 
purposefully set to be in a conservative to extremely conservative range. The greater the 
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severity of the consequence of failure, the more conservative the required inspection or 
replacement time becomes. Design engineers will set an inspection interval or 
replacement time that ensures the component is inspected several times or replaced 
before the expected failure (Rotor & Wing Magazine, April 2005). For instance, if the 
bearings on a helicopter's rotor head system are expected to fail or to wear to an 
unacceptable level after 500 flight hours, the design engineers may dictate a phase 
inspection where the bearings are disassembled and inspected every 100 hours, and then 
replaced regardless of condition by 300 hours. An aspect of the scheduled TBO 
maintenance concept is that, as the name implies, maintenance actions are tied to time. 
Maintenance planners must adhere to dictated usage limits. Sometimes there is a window 
of time, an allowable plus or minus percentage of usage, permitting some flexibility in 
planning. The counting and tracking of usage is critical in scheduled TBO maintenance. 

While the scheduled or TBO concept of maintenance practices has served the 
military well over many years, the concept has several inherent drawbacks. The first is 
that a preponderance of inspections or replacements are conducted on perfectly 
functioning components only because the usage time dictates so. If maintenance actions 
were performed only when a component was known to be in a state of unacceptable 
degradation or definitively failing, a great deal of time, effort and costs could be saved. 
Many inspections could be eliminated and perfectly functioning parts could remain in 
operation until they were known to be in one of the above-mentioned states. Another 
drawback of scheduled TBO is that it is rarely based on the history of the components. 
Using the prior example of the bearings in a helicopter's rotor system, if sufficient data 
had been collected which indicate that only 1 in 1000 bearings had degraded by the 500 
flight hour TBO, perhaps an inspection interval of every 400 hours could produce the 
same or better safety and readiness levels with a savings in time, maintenance effort and 
costs. "Historical data" is rarely incorporated into the scheduled TBO maintenance 
concept. 
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2. Condition Based Maintenance 

A different approach to the performance of aircraft maintenance is CBM. The 
underlying concept of CBM is to perform aircraft maintenance only when monitoring 
sensors indicate that maintenance is needed or will be needed on a component or system. 
Maintenance planners and maintenance actions are not tied to the counting and tracking 
of usage; rather the focus is on a component's state or condition. Monitoring sensors 
collect and record status and performance data of specific components while in use. 
From this data the actual condition, or state, of the components is then inferred for the 
user. This provides the ability to forego scheduled maintenance on components or 
systems which have reached their high times but are still functioning properly. Likewise, 
the user can specifically identify a failed or degraded component before its scheduled 
inspection and take immediate corrective maintenance action. Additionally, if a 
maintenance planner is alerted to the fact that a component is degrading, or that its 
performance is lessening although still operating at an acceptable level, the planner is 
afforded greater flexibility in the scheduling of maintenance. The maintainer not only 
understands that the component is wearing, but also, perhaps, at what rate and from that 
fact can choose the time of a required maintenance action. 

In summary, the goal of the move to CBM is to rapidly and accurately identify 
faults in order to eliminate time-consuming inspections and unnecessary component 
replacements. Potential benefits of CBM are the minimization of maintenance efforts 
and component replacement along with an increase in readiness and safety. Thus, the 
CBM concept has the potential to eliminate the shortfalls of scheduled TBO maintenance. 

Cases of success have already been demonstrated by the IMD-HUMS operating in 
the 30 UH-60L helicopters. For example the system was able to determine the cause of a 
persistent buzz felt by aircrew during flight. For 400 flight hours prior to the installation 
of the IMD-HUMS the buzz had been unidentifiable. After IMD-HUMS installation the 
source of the vibrations was isolated to the electrical generator. Upon removal of the 
generator the spline adapter was found to be severely worn. Replacement of the adapter 
eliminated the buzz. Other benefits have been realized in regard to several maintenance 
practices, readiness, and safety. During the thesis experience tour in which the system 
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was demonstrated to the authors, maintainors expressed that when using the system the 
process of both "main rotor track and balance" and "tail rotor vibes" had become much 
simpler, quicker and reliable with respect to maintenance requirements. For more 
successful applications of the CBM concept refer to Collacott (1979) which lists case 
studies and resulting benefits of CBM in the shipping, mining, production, nuclear power 
and aviation industries. For a better understanding of the DoD strategy and issues of 
CBM see Butcher (2000). This report addresses the benefits and rewards the military 
services are reaping through CBM as well as issues concerning further implementation of 
CBM. The IMD-HUMS is one of the key CBM programs case studied in the report. 

B. IMD-HUMS 

1. Purpose 

"...in the 22 years I've been in the Army, this is the best program as far as going 
from reactive to pro-active maintenance..." Sergeant First Class Reeve, Delta Co, 4th Bn, 
101st AVN Div, 7 June 2005. 

Coming from one of the maintainers of the 30 US Army UH-60L helicopters with 
IMD-HUMS, this quote by SFC Reeve lends credence to the potential and worth of the 
IMD-HUMS. The US Army plans to install the IMD-HUMS on all of its UH-60M 
helicopters. In addition, the system has been purchased by the US Navy for installation 
into CH-53E helicopters. The Navy is also considering installation of this system on the 
H-60, UH-1, AH-1, and V-22 aircraft (NAVAIR e-mail, 8 July 2005). Goodrich 
Corporation began development of the IMD-HUMS to perform CBM on helicopters in 
1997 under the auspices of the DoD Commercial Operations & Support Savings Initiative 
(COSSI). The underlying purpose of the IMD-HUMS is to improve flight readiness and 
safety, with the added bonus of savings in maintenance effort, time and costs. (Hess, 
2001 ) 

The IMD-HUMS provides automated equipment usage tracking for life-limited 
components, from entry into service until retirement. The usage tracking is used not only 
in the continuation of scheduled TBO maintenance practices, but also for determining 
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accurate component lifetimes and for developing component fault prediction models. 
Instrumentation aboard the aircraft collects usage data during aircraft operations, which is 
then applied to life-limited components currently installed on the aircraft (IMD-HUMS 
User Manual, 2005). With IMD-HUMS component usage times are automatically 
counted and tracked for TBO; previously this process was conducted manually. Most 
important, usage times may be computed from any number of variables, including time 
spent in various flight regimes. It stands to reason that components of aircraft which fly 
mostly straight and level and take off and land at improved airfields wear more slowly 
than components on aircraft that are used for high-stress maneuvers in harsh 
environments, like, for example, the deserts of Iraq. The IMD-HUMS tracks these 
regimes, and, through study, users may be able to determine what components wear, 
under what regimes and at what rate. Through this capability, flight readiness and safety 
are enhanced through the early identification of degraded components (IMD-HUMS User 
Manual, 2005). 

2. Concept of Operations 

The IMD-HUMS provides an automated capability to monitor, diagnose and track 
usage for many components of a helicopter. Sensors of the IMD-HUMS which are 
installed on the helicopters collect data during flight operations. The initial acquired 
measurements are physical in nature: motion, rates of motion, and forces. An acquisition 
is the record of a specific set of these measurements over a fixed period of time. For each 
acquisition, the IMD-HUMS manipulates these readings through proprietary algorithms 
to compute Cl, and from these, HI for each component. The Cl are values which depict a 
certain aspect of a component's state and are calculated from the raw data of physical 
measurements. The Cl are aggregated to produce a components health indication (HI). 
This collection of Cl and HI for each acquisition is then used for maintenance 
diagnostics. 

The two main sub-systems of the IMD-HUMS are the On-Board System (OBS) 
and the Ground Station System (GSS). The OBS is physically located on the helicopter 
and is comprised of a cockpit display unit (CDU), a data transfer unit (DTU) and data 
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transfer memory unit (DTMU), a remote data concentrator (RDC), a main processor unit 
(MPU), two junction boxes (JB1/JB2), 30 accelerometers, a main and tail rotor magnetic 
RPM sensors, a main rotor blade tracker, and engine output shaft optical tachometers. 
The GSS is external to the helicopter, runs on a PC and is comprised of the computer 
hardware and software that reads and processes the data collected from the OBS. (Figure 
1) (IMD-HUMS User Manual, 2005) 
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Figure 1. The Components of the IMD-HUMS (from System Users Manual for 
IMD-HUMS) 
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A helicopter component in operation results in an associated vibration. Each of 
the many components of an operating helicopter produces vibrations. It is these 
vibrations of which the IMD-HUMS takes readings. IMD-HUMS data collection begins 
at the various aircraft sensors. For instance, the sensors used for data collection from the 
electrical generators are accelerometers; they are located on the transmission accessory 
gear box modules, one for each of the two generators (Figure 2). These accelerometers 
are used to measure the specific vibrations which come from all the internal components 
such as gears, shafts and bearings throughout the transmission accessory gear box 
module, not just the electrical generators. Data collected by the accelerometers is then 
sent directly to the MPU for processing. 
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Figure 2. Focation of IMD-HUMS Accelerometers on UH-60F Helicopter (from 
System Users Manual for IMD-HUMS) 


The MPU is located in the aircraft's transition section avionics bay. It is the brain 
of the OBS portion of the IMD-HUMS. The MPU receives data from the accelerometers 
and performs the following tasks: conversion of analog data into digital data; recognition 
of flight regime and determination of regime duration; conversion of data into Cl; 


13 



recognition of vibration exceedances; and the storage of data for transfer to the DTU. 
After the data has been processed by the MPU, the resulting outputs are referred to as 
acquisitions (IMD-HUMS User Manual, 2005). This data is in raw data file (rdf) format. 

The Ground Station System consists of all the software and hardware associated 
with the analysis of the acquisitions not located on the helicopter. Once the acquisitions 
are downloaded from a DTMU, this data and all other data from all flights of all aircraft 
using the IMD-HUMS are available for analysis. The GSS will automatically generate 
some of the required maintenance actions resulting from an IMD-HUMS equipped 
aircraft's flight (IMD-HUMS User Manual, 2005). 


C. UH-60L HELICOPTER AND ELECTRICAL GENERATORS 

The UH-60L (Blackhawk) (Figure 3) is a twin turbine engine, single rotor, semi- 
monocoque fuselage helicopter. The primary mission capability of the helicopter is 
tactical transport of troops, supplies and equipment. Secondary missions include training, 
mobilization, development of new and improved concepts, and support of disaster relief. 
The US Army alone has over 1,900 H-60 helicopters in its inventory (International 
Institute for Strategic Studies, 2005). The incorporation of IMD-HUMS into the H-60 
fleet is a major financial investment with great implications concerning the maintenance 
practices of these helicopters. 
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Figure 3. UH-60L Blackhawk Helicopter (from Operators Manual for UH-60L 
Helicopter) 


There are two electrical generators in each UH-60L helicopter (Figure 4). They 
are mounted on and driven by the transmission accessory gear box module. Each is 
capable of supplying the total helicopter power requirements (Operators Manual for UH- 
60L Helicopter, 2003). Main components associated with the electrical generators are as 
follows: the spur gear located in the accessory transmission model which transfers the 
rotational power to rotate the generator shaft, the bearings which support and stabilize the 
generator shaft, and the generator shaft itself which rotates along with mounted brushes 
to produce electricity. 
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Figure 4. UH-60L Generator (after Intermediate Maintenance Repair and Special 
Tools List for UH-60L) 


D. PHYSICS OF VIBRATIONS AND EXPLANATION OF TERMS 

This section provides an overview of the basic physical concepts, terms and tools 
used in CBM and specifically the IMD-HUMS. These concepts are used to describe the 
important Cl computed by IMD-HUMS and used in this thesis. Also explained is how 
these Cl are used to assess the health of a component. 
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1. IMD-HUMS and Mechanical Vibrations 

An oscillation is the variation, usually with time, of the magnitude of a quantity 
with respect to a specified reference when the magnitude is alternately greater and 
smaller than the reference (Harris, 2002). A vibration is an oscillation where the varying 
quantity is the parameter that defines the motion of mechanical system (Harris, 2002). It 
is the vibrations from operating components which the IMD-HUMS acquire for analysis. 
A rotating high-speed engine shaft, a main transmission gear turning, and a main tail 
rotor blade rotating, twisting, and flapping in multiple directions all produce some type of 
vibration. The gears, shafts and bearings of the UH-60L generators, which are the 
components chosen for this study, also produce vibrations when in operation. 

The IMD-HUMS uses accelerometers, also known as pezio-electric transducers, 
to measure mechanical vibrations. Specifically they measure changes in the rate of speed 
of displacement, or acceleration, of a component in a particular direction. 
Accelerometers convert physical acceleration into analog electrical voltages. These 
accelerations oscillate over time hence the resulting motion is a vibration (Collacott, 
1979). 

The peak-to-peak (P2P) value of a vibrating quantity is the algebraic difference 
between the extremes of the quantity (Harris, 2002). The IMD-HUMS considers the 
peak-to-peak value of vibrations because this value tends to increase when vibrating 
components begin to fail. 

The term envelope (Env) refers to the fact that the background signals are 
removed from a vibration leaving only the portion of the vibration which is to be focused 
upon or analyzed (Harris, 2002). The IMD-HUMS will extract the envelope signal for 
some of its outputs. 

Probability Density Function (pdf) and kurtosis are statistical concepts applied to 
vibration analysis. All vibrations have a characteristic pdf which characterizes the 
probability of a specific instantaneous vibration occurring. Vibrations of good operating 
components usually have pdfs with a bell-shaped curve. Deviations from the bell-shaped 
curve can be used to indicate failing or degrading components. The fourth moment, or 
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kurtosis, of the curve is best suited to capture these deviations. This approach has been 
particularly useful in the vibration analysis of bearings (Rao, 2004). 

The term meshing is used to define the working contact or the fitting together and 
interactions of gears. Meshing of gears results in vibrations which the IMD-HUMS 
measures. 

2. Condition Indicators and Health Indicators 

Condition Indicator/Indication(s) (Cl) and Health Indicator/Indication(s) (HI) are 
terms developed by Goodrich Corporation. The Cl are variables computed by IMD- 
HUMS from the raw vibratory data. They are used as a measure of a component's state at 
the time of acquisition. There are several types of CL The important Cl used in this 
study are described in the following paragraphs. Up to eight different Cl are used by the 
IMD-HUMS to calculate a value which summarizes a component's overall state, known 
as a HI. For each specific component there is a proprietary algorithm developed by 
Goodrich Corporation which determines exactly how its HI is computed. HI are scaled to 
have values between 0 and 1. During the time period in which data is collected, a HI 
value between 0.0 and 0.32 is normal (operating fine), between 0.33 and 0.66 is called a 
warning, and between 0.67 and 1.0 is called an alarm (software changes subsequent to the 
data collection period have resulted in changes to the HI scale). 

Shaft Order 1 (SOI) is a measurement used to detect dynamic imbalances and 
shaft misalignment with supporting structures (usually bearings) of a shaft. It has 
dimensions of distance per unit time, measured in IPS (inches per second). A single 
oscillation in the resulting vibration occurs (order 1) for each complete shaft revolution 
when an imbalance and/or misalignment exists. These imbalances and misalignments are 
a result of wearing and degrading shafts and bearings (Harris, 2002). 

Shaft Order 2 (S02), like SOI, is a measure used in detecting shaft misalignment 
with supporting structures in a shaft. It has dimensions of distance per unit time, 
measured in IPS. Two oscillations in the resulting vibration (order 2) for each complete 
shaft revolution results when a misalignment exists (Harris, 2002). 
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Residual Peak-to-Peak (Res_P2P) is a measurement of displacement (distance 
dimension) in a vibration. The term "residual" speaks to the fact that the strong tones are 
first removed from the vibration leaving only the portion of the peak-to-peak 
displacement which results from regularly existing background vibration (Harris 2002, 
P'^I VPU/DTD Software Requirements Specifications, 2001). 

The ball energy measurement results from defects of a spinning ball bearing. This 
measurement is used to detect defects or wear in the bearings and is in the dimensions of 
force, distance and time (Rao, 2004). 

Envelope peak-to-peak (Env.P2P) is a measure of the periodic impulses due to 
bearing defects. Background signals within the vibration are first removed from the 
vibration, leaving only the portion of the vibration which best depicts the bearing defect. 
Envelope peak-to-peak is in the dimension of distance (Rao, 2004). 

Envelope Kurtosis (Env.Kurtosis) is a measurement of how the periodic impulses 
due to bearing defects affect the curve of the pdf of the bearings' total vibration. Kurtosis 
measures the thickness of the tails of the distribution of bearing vibrations after the 
background signals have been removed (Harris, 2002). 

Envelope Distributed Eault (Env.DE) is a dimensionless ratio of the standard 
deviations of the envelope data (data after background signals are removed) and all raw 
data (the total vibration). This measurement is used in the analysis of bearing defects. 
The term "distributed" refers to the fact that all possible directions of displacement are 
considered in this measurement (Harris, 2002). 

Gear Distributed Eault (GDE) is a dimensionless measurement resulting from the 
ratio of unexplained and explainable variances of a vibration resulting from the meshing 
of gears. It is believed that this measurement is an indication of gear teeth wear and 
cracks (Harris, 2002). 
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The G2-1 measurement is a result of an algorithm which considers the average 
peak-to-peak and energy output of a vibration resulting from the meshing of gears. It is 
used in the analysis of gears. The term was developed by Goodrich Corporation and the 
algorithm which determines its value is proprietary (PI VPU/DTD Software 
Requirements Specifications, 2001). 

Gear Misalignment 1 (GearMis_l) is a dimensionless measurement resulting from 
the ratio of the energies of the vibrations produced when gears mesh (Harris, 2002). 
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III. DATA ANALYSIS 


This section details the process of data analysis for this thesis. It begins with an 
explanation and description of the data and how the data are partitioned into a training 
data set and an experimental data set. Next, for the training data a brief graphical 
exploration of the differences between good and bad generators as well as differences 
among bad generators is given. The remainder of the chapter deals with variable 
selection and the fitting of the logistic regression and forest of trees models. 


A. DATA 

1. Data Collection 

The authors first visited the US Army unit conducting the operational test of the 
IMD-HUMS. The soldiers of this unit, 4th Regiment 101st AVN Division, are the 
operators and maintainers of the 30 UH-60L helicopters which have IMD-HUMS 
installed. During the ten-day visit the components and concept of operations of the 
system were explained, the operation of the system was witnessed, and the IMD-HUMS 
data output was shown. The authors were permitted to fly aboard one of the helicopters 
during which time the data collection process from beginning to end was demonstrated 
and explained in detail. The soldiers then explained the unit-level data analysis and 
maintenance practices which result from these data collections. They also provided 
several specific cases of successful implementation of the system and cases of interest for 
possible study. Of particular interest were six electrical generators which have been 
replaced for cause. The IMD-HUMS data concerning these replaced generators provided 
an opportunity to determine whether the data can predict the cause and/or need for 
generator replacement. 

In the two years of IMD-HUMS use in the 30 UH-60L helicopters, data has been 
collected on 66 different electrical generators. In these two years six generators were 
removed from operations for some reason of fault; the remaining 60 generators were 
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deemed to be working properly. Where available, pre-fault data, maintenance records 
and photographs were used to explain the circumstances of the faulted generators. 
Photographs show that some of the faulted generators had worn or totally broken 
components. In addition, the maintenance history of each of these faulted generators was 
investigated. Table 1 provides a summary of the cases of each of these six faulted 
generators. The failure of two of the generators, numbers 9 and 33, were detected during 
operation by a generator warning light. Faults in the remaining four generators, numbers 
22, 31, 53 and 56, did not trigger the generator warning light. However each of the four 
generators had unusually high SOI readings upon removal. Three of these generators, 
numbers 22, 31 and 53, showed evidence of fault or wear. The removal of generator 
number 56 resulted from the case of an identifiable buzz explained earlier in Chapter II. 
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Table 1. Confirmed Bad Generators 


Confirmed Bad Generators 

Aircraft / 
Side 

Generator 

# 

Reported Comments 

450 Left 

9 

# 1 generator failed during shutdown upon APU 
generator coming on during start; replaced #1 
generator ^ 

829 Left 

33 

# 1 generator bad; replaced generator ^ 

518 Left 

22 

SOI near 2 IPS so replaced Spline Adapter Coupler 
during next scheduled maintenance. Evidence of 
wear and possible improper installation. SOI 
returned to .05 IPS after replacement. 

549 Left 

31 

Replaced Spline Adapter Coupler due to SOI at 3 

IPS. Adapter severly worn and two 1 inch cracks 
found. IPS still high after Adapter replacement so 
generator also replaced. ^ 

515 Right 

53 

While getting modified with IMD-HUMS, vibration 

was noted, found to have SOI at 3 IPS. Adapter 

Coupler was replaced (had some wear) and SOI 

2 

vibrations dropped below .05 IPS. 

518 Left 

56 

3 Jan 04 Mosul: had a weird buzz on left-hand side 

ceiling, isolated to generator (found to have SOI 

2 

over 4 IPS). Generator and coupling replaced. 

Source: 

1 Maintenance Records 

2 Johnny Wright and Ground Station Team, IMD-HUMS Fault Detections, 

Goodrich Corporation. Draft 5/25/2005 (Ver 117) 
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2. Data Description 

More than 60GB of data, which consists of acquisition data for all the helicopters' 
monitored components, was sent to NFS in rdf format. The data readings concerning all 
the generator shaft, spur gear and bearing were then extracted and converted to column 
separated value (csv) format for data exploration and analysis. Each IMD-HUMS 
acquisition concerning the shaft, spur gear and bearings of a generator results in 169 
variables. The 169 variables are listed in Appendix A. 


RESPONSE 


POTENTIAL 

PREDICTORS 


X2 Xt... 


Xl69 
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36,743 


Figure 5. Example of the IMD-HUMS Data in CSV Format 


Each generator is assigned a number, 1 through 66, for ease of identification and 
data manipulation. These numbers were then incorporated into the data set. Among the 
169 variables recorded for each acquisition are the Health Indicators for the gear, bearing 
and shaft. Some generators had acquisitions which numbered in the tens, others in the 
hundreds, and others in the thousands. In total, for all 66 generators, there are 36,743 
separate data acquisitions from the two-year period during which the IMD-HUMS were 
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installed. Two generators, numbers 23 and 34, were removed because these two 
generators have less than 20 acquisitions during their time of operation, leaving data from 
64 generators. 

The data set is divided into two separate groups; the "training" set to be used to 
develop models which predict whether a generator is faulty or not based on Cl, and an 
"experimental" set used to test how well these models actually predict whether a 
generator is faulty or not. 

3. Training Data Set 

Each generator in the training set is assigned a binary value of 1 or 0 to classify 
their known state. The value of one is given to the generators removed for fault, 
henceforth referred to as bad generators. The value of zero is given to the generators not 
removed, referred to as good generators. 

A complication of this binary classification system is that there may be bad 
generators, ones which will eventually fail, classified as good because their faulty 
condition has not yet been identified. The large number of good generators included in 
the training set serves as protection from these errors, diminishing the influence of any 
incorrectly classified generators. This is a critical assumption in the analysis. The fact 
that each generator is assigned a state of 0 (good) or 1 (bad) does not mean these 
generators are actually in the assigned state. The assigned state of 0 (good) or 1 (bad) is 
based strictly upon whether a generator was removed for fault or not. A generator with 
an undetected fault would be assigned a state of 0 (good). Likewise a generator which 
was replaced for a reason of fault and assigned a state of 1 (bad) could actually have been 
mechanically good; perhaps the electrical contacts or wiring could have had a short- 
circuit. This is the reason the authors investigated the circumstances and maintenance 
actions of each of the replaced generators. 
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The training set consists of data from 52 of the 64 generators. Five of the training 
set generators had been taken off their helicopters for fault and are classified as bad; the 
remaining 47 training set generators had no known faults throughout their data history 
and are classified as good. 

Only Cl computed in the last 20 acquisitions of each generator of the training set 
were used in the development of the prediction models. This is because the faulted 
generators, classified as bad, most likely were not bad throughout their entire two-year 
history. By restricting analysis to the last 20 acquisitions the risk of including 
observations from bad generators gathered before the fault occurred is reduced. The 
choice of 20 acquisitions is a judgment call made by the authors after inspecting the 
general trend of Cl and HI. This reduced the training set to a total of 1040 acquisitions. 

4. Experimental Data Set 

The experimental set consists of data from the remaining 12 generators. One 
generator, number 33, was taken off its helicopter for fault and the remaining 11 
generators in the experimental set worked properly throughout their data history. 
However, six of these 11 generators were put on what the users called the "watch list,” 
the list of generators with questionable status (Table 2). The watch list consists of 
generators which show generator shaft Cl or HI values which indicate that perhaps these 
generators are beginning to degrade. Two of the generators, numbers 30 and 21, are 
considered to be in a priority status due to shaft order 1 (SOI) readings above 2.0 IPS. 
The other four watch-list generators have SOI readings above 1.5 IPS. These generators 
are included in the experimental set to make a final determination of their status using the 
prediction model. 

The five remaining good generators in the experimental data were on the opposite 
side of the four watch list generators and the one faulty generator. 
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Table 2. Generator Watch List 


Generator Watch List 

Aircraft / 
Side 

Generator 

# 

Shaft Order 1 Vibrations (IPS) 

As of 5/27/2005 RTH (Rotor Turn Hours) 

545 Left 

30 

reached 2.3 IPS and is increasing at 2.2 IPS per 100 RTH 

516 Left 

21 

reached 2.5 IPS and is increasing at 0.5 IPS per 100 RTH 

441 Left 

6 

reached 1.5 IPS and increasing at 0.4 IPS per 100 RTH 

516 Right 

55 

reached 1.78 IPS and is increasing at 0.1 IPS per 100 RTH 

493 Right 

48 

reached 1.85 IPS and is not increasing 

519 Left 

24 

reached 1.55 IPS and is increasing at less than 0.1 IPS per 
100 RTH 

Source: Harrison Chin, Dave Green, Eric Mayhew, Johnny Wright, Generator Shaft 
Analysis: Expanded Survey Including #441, #515, #516, #518, #519 and #545, 

Goodrich Corporation, Draft 5/27/2005 (Ver 3) 


B. GRAPHICAL ANALYSIS 

Projection Pursuit (Hastie, Tibshirani & Friedman, 2001) implemented by the 
statistical software Ggobi, is used to gain a visual perspective of the relationship among 
the variables. Ggobi plots two-dimensional projections of multi-dimensional data. The 
projection pursuit algorithm numerically searches for two-dimensional projections which 
maximize one of several possible measures of interest. These projections are displayed 
graphically and the plot is continually updated as the algorithm pursues “optimal” 
projections. The display is interactive, and Ggobi allows the user to stop the display and 
manually change the projection at any point. By using projection pursuit several insights 
are gained concerning the data. 

Projection Pursuit is first used to study the five bad generators from the training 
set. Five variables, the Cl: SOI, S02, Env.P2P, GearMis_l, and Ball Energy, from the 
169 variables relating to generators are used as input variables. The SOI and S02 
variables are accepted common indications for shaft conditional state. The remaining 
three variables are used to address the conditional state of the gears and bearings. 
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Figure 6. Ggobi Dispaly - Clusters of Bad Generators 


Figure 6 shows the Ggobi graphical display of the projection of these five 
variables for the five bad generators of the training set. The figure shows that four of the 
five bad generators form single clusters. Only one of the generators, number 53, forms 
two clusters, one in the upper right of the display, the other in the lower left of the 
display. From this display, one might be tempted to propose that the two clusters 
represent two different time periods. However, this is not the case. 
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Let and be the linear functions of the five Cl computed by Ggobi in Figure 


6. Further, consistent with Figure 6, let 


V 4 = SOI 
V 5 = S02 


Vg = Env.P2P 
X-, = GearMis_l 
Vg = Ball Energy. 

Then y^ and ^2 can be computed as 
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Plotting yj and yj acquisition time sequence, Eigure 7 clearly shows that the yj and 
y 2 values of generator number 53 oscillates between the two groups depicted in Eigure 6 
over time. 


Ggobi y1 and y2 
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Ggobi is also used to investigate clustering of generators classified as good and 
bad. The five same Cl are again used as input variables. The resulting display, Figure 8, 
shows a definite difference in grouping of variables between most of the good generators 
(light grey dots if viewed in the non-color copy or yellow dots if viewed in the color 
copy) and the bad generators (dark grey dots if viewed in the non-color copy or purple if 
viewed in the color copy). However, one bad generator, number 9, seems to be clustered 
with the good generators. 



Figure 8. Ggobi Display - Light gray (yellow) dots are good generators, dark gray 
(purple) dots are bad generators 
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C. INITIAL VARIABLE SELECTION PROCESS 

With 169 variables initially in the training data set, we reduced the number of 
potential predictors based upon an understanding of the IMD-HUMS, the physical 
operation of the helicopter and generators, and the vibrations they produce. This variable 
reduction is necessary when fitting parametric models such as logistic regression. It is 
also desirable but not strictly required when using certain data mining techniques. The 
169 variables include the HI for the shaft, gear and bearing. The current practice is to 
rely heavily upon the HI of the generator shaft to assess the overall health of the 
generator. A single model incorporating acquisitions from all three components, 
however, might better detect other modes of fault or degradation. 

The classification models are based on variables that describe the state of the 
three main components involved in the operation of the generator: the generator's shaft, 
supporting bearings and supporting spur gear. The authors believe doing so explains the 
overall state of the generator better than separately tracking and assessing each 
component's HI. 

The first step in variable reduction is to eliminate any variables which do not 
originate from, or directly address, one of these three components and their physics of 
operation. For example, consider torque (a measure of power output) readings of each 
engine at the time of acquisition. Once up and running, the electrical generators turn at a 
nearly constant speed, under a nearly constant force, regardless of engine torque. The 
transient run-up time to generator rotational speed is minimal. Therefore the engine 
torque readings are eliminated as possible predictors. Explained another way, changes in 
engine torque are not expected to result in significant changes of generator speed or 
forces. The same reasoning is applied to eliminate other variables. For example, 
acquisition date/time, aircraft tail number, airspeed, main rotor speed, outside air 
temperature, main gear box temperature, and flight regime are all eliminated. 

It seems this reasoning should also be applied in determining whether position of 
the generator (left or right side of the helicopter) should be included as a variable. The 
left and right generators are identical and interchangeable in all physical aspects. The 

only distinction between them is their name "left" or "right" given by the side of the 
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helicopter they are installed on. However, graphical analysis of certain Cl, particularly 
Residual Peak-to-Peak, show clear differences in both mean and variance of these values 
between left and right generators. (Figure 9). 




Generator Number 


Figure 9. Dot Plot of the Residual Peak-to-Peak Cl for Each Generator 


The differences may be caused by slight variations in the way that complicated 
vibrations are transmitted from the accelerometer to the MPU. While the variables 
indicating left or right side of aircraft is not explicitly included in the analysis, the 
left/right position is implicitly captured with variables such as residual peak-to-peak and 
envelope distributed fault. 
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Another method of eliminating variables is to drop any redundant or nearly 
redundant variables. For instance, shaft orders one, two, three and one-half are all 
calculated in three different scales: IPS, OBS, and G forces. Each is a constant multiple 
of the other. Thus the shaft order readings in the scale of IPS were kept while the others 
were dropped. Normalized versions of the variables ball energy, cage energy, inner race 
energy, outer race energy and total bearing energy were also dropped since non- 
normalized readings for each of these exist in the data set. 

By eliminating redundant variables and those not directly involved with the 
generator shaft, gear and bearing, the 169 variables were reduced to 65 variables. 
Appendix A is a listing of all 169 variables with the 65 remaining variables highlighted. 

However, redundancies still exist among the remaining variables. For example, 
computation of the sample correlations between the 65 predictor variables gives 16 pairs 
of variables with sample correlations greater than 90%. These high correlations are an 
indication of multicollinearity among the predictors. In addition, the principle 
components of the standardized variables (Hastie, Tibshirani & Friedman, 2001) indicate 
that the first 10 principle components account for 68% of the variability of the 65 
variables (Figure 10). Over 95% of the variability can be captured with 34 components. 
This confirms our suspicion that generator condition can be captured in fewer dimensions 
than the current data set. Figure 10 shows the percentage of variance captured in the first 
ten principle components of the 65 standardized predictor variables. 
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Relative Importance of Principal Components 


0.173 



Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9Comp. 10 


Figure 10. Variance Captured in First Ten Components of Data Set Containing 65 
Predictor Variables 
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Table 3. Pairwise Sample Correlations 


Cl 

CORRELATION 

CLASSIFICATION 

SHAFT ORDER 2 

0.690 

STRONG 

SHAFT ORDER 1 

0.654 

MEDIUM 

SHAFT ORDER 3 

0.491 

MEDIUM 

ENVEEOPE PEAK TO PEAK 

0.376 

MEDIUM 

GEAR DISTRIBUTED FAUET 

-0.305 

WEAK 

BASE ENERGY 

0.275 

WEAK 

BAEE ENERGY 

0.252 

WEAK 

BEARING ENERGY 

0.197 

WEAK 


Pairwise sample correlations of the 65 predictors with the binary response 
variable indicating good or bad are also computed. Of these, the variables with the 
highest correlation are given in Table 3. Correlations from 0.0 to 0.33 are classified as 
weak, correlations from 0.34 to .66 are classified as medium, and correlations from 0.67 
to 1.00 are classified as strong. 

This analysis of correlation provides indications of useful variables for the 
models. They are shaft orders 1, 2 and 3 which result from vibrations of the generator 
shaft, and in the case of shaft order 1 from the bearings also. Shaft order 2 has the 
strongest correlation with the response variable of all 65 predictors. Shaft orders 1 and 3 
have the next highest correlations, classified as medium. Envelope Peak-to-Peak which 
result from vibrations of the bearings shows the next highest correlation, also classified as 
medium. The remaining predicators have differing measures of weak correlation with 
respect to the response. 
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D. LOGISTIC REGRESSION MODEL 


Let n - 1,040 be the total number of observations in the training data set; and let 
/ = 1,2,3,...,n, represent the binary random variable indicating whether the 
observation comes from a bad generator {Y.=l) or a good generator (1^=0). The 
logistic regression model assumes that Y^ are independent Bernoulli variables with 
= P{Y.= l) for / = 1,2,3...,n . In addition the logistic regression model "links" tt. to 
the observed values of the k predictors for the observation as 

follows: 


In 






= A) + + - + Pkhk i = 1’ 2,3,..., n 


where are the k + l parameters or coefficients to be estimated. 


The benefit of using logistic regression in the model is it can be used to estimate 
n , the probability that the observation comes from a bad generator rather than a good 
generator. 

There is one assumption for logistic regression which our application of the model 
violates heavily. Logistic regression requires that the Y^ be independent of one another. 

Time-series collections, and the method of classifying an entire generator, not each 
individual acquisition, as good or bad create an unusual dependency between acquisitions 
within each generator. To fit the models, the last 20 acquisitions from each generator in 
the training set are used, thus violating independent sampling. For instance, a single 
worn or damaged ball bearing wears more and more with continued operation. Further 
acquisitions depicting more wear and damage will result. Therefore the state of a 
component is dependent upon its past state. However, here logistic regression is used to 
compute summary statistics rather than for inference. Thus the real proof of the utility of 
using this approach lies in how well it predicts problems in the generators in both the 
training and experimental data sets. 
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Two approaches are used to fit the logistic regression model. The first method 
creates a compact model which has no correlation or left/right generator issues; however, 
the second method is chosen for the final model due to better performance. 

The first method forces inclusion of shaft, gear, and bearing CL Three logistic 
regression models are fit: one with Cl originating only from the bearings, another with Cl 
originating only from the gear, and still another with Cl originating only from the shaft. 
Backwards elimination is used to select variables for each of these models, i.e. the Cl 
with the greatest p-value is eliminated from the model at each step of the backwards 
elimination procedure. The end result is 20 variables for the bearings, 14 variables for 
the gear and 5 variables for the shaft. The purpose of fitting separate models based on Cl 
from the three separate components is to ensure that potential predictors for each 
component are included in the final model. These three sets of variables are then 
combined, and another logistic regression model is fit using backwards elimination for 
variable selection. With each logistic regression printout Null Deviance (ND) minus 
Residual Deviance (RD) is considered. In logistic regression fits where all modeling 
assumptions such as independence apply, a small RD is desired but not at the expense of 
an over-fit model. Including all or too many of the potential variables would result in 
over-fitting; the resulting model would predict the training data set very well but would 
include unnecessary variables and may not be usable for predictions on other data. 

This process gives a model with only five predictors: SOI, S02, GearMis_l, Ball 
Energy, and Env.P2P. These Cl have low pairwise correlation and the variable indicating 
left/right generator is not needed, but the performance compared to the final model is 
inferior (Table 4). 
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Table 4. Comparison of Logit Model Performance 


Model 

Logit Model Fitting Criteria 

Over Fit 

65 variables 

Null Deviance: 658.4219 on 1039 degrees of freedom 

Residual Deviance: 0 on 974 degrees of freedom 



Under Fit 

5 variables 

Null Deviance: 658.4219 on 1039 degrees of freedom 

Residual Deviance: 213.6768 on 1034 degrees of freedom 



Final Model 

10 variables 
(fitted with 
generator # 7 
classified as bad, 
explained in 

Results Chapter) 

Null Deviance: 743.8645 on 1039 degrees of freedom 

Residual Deviance: 77.71993 on 1029 degrees of freedom 

Likelihood 

Ratio Test 

Chi- 

Square 

degrees of 
freedom 

Significance 

666.145 

10 

. 000 


The second logistic regression model was fit by the following process. We begin 
with the 65 variables determined after initial variable elimination. Further elimination of 
redundant or similar variables led to the removal of 16 bearing and 2 gear predictors. 
These variables were eliminated because the pool of predictors included other variables 
derived from the same vibration, differing only from the dropped variables by the 
algorithm from which they are derived. For instance, the gear variable "AM kurtosis" is 
dropped because "derivative AM kurtosis" is also present. 

A logistic regression model was then fit in Clementine using the 47 variables left 
in the predictor pool. Backwards elimination was again used to eliminate variables 
further, leaving 12 CL At this point the classification error rate of the model was also 
monitored so as to choose the final number of predictor variables in a backwards- 
stepwise fashion. Variables continued to be eliminated from the model as long as the 
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misclassification rate stayed low. When the output shows an increase in the 
misclassification rate, the last eliminated variable is re-installed in the model and that 
model is deemed best. Using this method, "Envelope Crest Factor" and "Shaft Order 3" 
were eliminated resulting in a final logistic regression model with an overall correct 
classification rate of 99%, and only one observation from a bad generator classified as 
good. 

Table 5. Logit Model Classification Rate 


Classification 

Observed 

Predicted 

0(bad) 

1(good) 

Percent 

Correct 

0 (bad) 

919 

1 

99.9% 

1(good) 

9 

111 

92.5% 

Overall Percentage 

89.2% 

10.8% 

99.0% 


Table 6. Logit Model Fitting Information 


Model Fitting Information 

Model 

Model Fitting Criteria 

Likelihood Ratio Tests 

-2 Log Likelihood 

Chi-Square 

df 

Sig. 

Intercept Only 

743.864 




Final 

77.720 

666.145 

10 

.000 


Goodness-of-Fit 



Chi-Square 

df 

Sig. 

Pearson 

6525.330 

1029 

.000 

Deviance 

77.720 

1029 

1.000 


Pseudo R-Square 


Cox and Snell 

.473 

Nagelkerke 

.926 

McFadden 

.896 
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Table 7. Logit Model Likelihood Ratio Tests 


Likelihood Ratio Tests 


Effect 

Model Fitting Criteria 

Likelihood Ratio Tests 

-2 Log Likelihood of 
Reduced Model 

Chi-Square 

df 

Sig. 

Intercept 

119.540 

41.820 

1 

.000 

Shaft Order 1 (IPS) 

145.015 

67.295 

1 

.000 

Shaft Order 2 (IPS) 

144.412 

66.692 

1 

.000 

Gear Distributed Fault 

138.842 

61.122 

1 

.000 

G2-I 

80.286 

2.566 

1 

.109 

Residual Peak to Peak 

141.611 

63.891 

1 

.000 

Gear Misalignment I 

94.404 

16.684 

1 

.000 

Ball Energy 

78.850 

1.130 

1 

.288 

Envelope Peak to Peak 

184.416 

106.696 

1 

.000 

Envelope Kurtosis 

117.695 

39.975 

1 

.000 

Envelope Distributed 
Fault 

79.924 

2.204 

1 

.138 


The chi-square statistic is the difference in -2 log-likelihoods between the final 
model and a reduced model. The reduced model is formed by omitting an effect 
from the final model. The null hypothesis is that all parameters of that effect are 0. 
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Table 8. Logit Model Parameter Estimates 



B 

Std. Error 

Wald 

df 

Sig. 







Intercept 

37.010 

7.225 

26.242 

1 

.000 

Shaft Order 1 (IPS) 

- 9.369 

2.025 

21.407 

1 

.000 

Shaft Order 2 (IPS) 

- 98.357 

20.245 

23.603 

1 

.000 

Gear Distributed Fault 

- 39.442 

8.025 

24.153 

1 

.000 

G2-1 

.019 

.013 

2.370 

1 

.124 

Residual Peak to Peak 

1.004 

.207 

23.473 

1 

.000 

Gear Misalignment 1 

.189 

.054 

12.223 

1 

.000 

Ball Energy 

- 96.958 

124.581 

.606 

1 

.436 

Envelope Peak to Peak 

- 6.652 

1.244 

28.571 

1 

.000 

Envelope Kurtosis 

3.979 

.975 

16.653 

1 

.000 

Envelope Distributed Fault 

61.703 

44.512 

1.922 

1 

.166 


E. TREE MODELS 

Due to the large number of possible predictor variables (Cl) available in the data 
set, a nonparametric, data mining approach is used to augment and check the predictions 
of the logistic regression model. We use a procedure based on Classification and 
Regression Trees (CART) developed by Breiman, Friedman, Olshen and Stone in 1984. 
CART searches all predictors in a data set, making a split in each predictor which reduces 
variability of the dependent variable to the minimum within the resulting subsets. This 
creates two leaves, each of which can be split again. This continues until a minimum 
threshold is reached. 

The tree-fitting process provides information about predictor importance as well 
as a decent prediction. However, it is vulnerable to over-fitting and thus requires cross- 
validation and pruning (limiting the number of splits). Figure 11 shows the un-pruned 
classification tree created from the last 20 acquisitions of each generator in the training 
set. The 65 Cl determined by initial variable elimination are used to fit this tree. 
Appendix F displays the remaining S-Plus training set classification tree output. 
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Sh^ft.0rder.1..IPS.<1.72485 


G2.1 <38.5724 


Base.Energy<0.655714 


Gea 


Half.Sha ft.OrdeH..IPS.<Q .284577 


B B 


Misalignnient.3<-41.7 041 


G2.3<65.3999 


B B 


Figure 11. S-Plus Classification Tree using Training Set. The inequality above each 
split corresponds to the left branch. At each leaf "G" indicates a leaf with a higher 
proportion of good generators and similarly "B" indicates a leaf with a higher proportion 
of bad generators. 


Classification trees are an intuitive way to see how the data can be split into 
subsets capable of predicting the dependant variable. However, their accuracy is not 
always satisfactory. Leo Breiman introduced the concept of aggregating many different 
trees and allowing them to each “vote” on their prediction of the dependant variable 
(Berk, 2005). Different aggregation methods have been developed which create the 
multiple trees, or forests, in different ways. Bagging builds trees on many bootstrap 
samples. Boosting is a more complicated method which first seeks out errors while re¬ 
sampling from original data in order to focus on the marginal boundaries. Accurate trees 
are then given more weight to their vote; this process creates predictions with excellent 
misclassification rates. Here, the random forests method is used as a nonparametric cross 
check to the logistic regression model because it builds new trees by randomly choosing a 
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subset of predictor variables each time. Pruning is not required as the aggregated voting 
process protects against over-fitting. This algorithm is ideal for the large IMD-HUMS 
data set. Five hundred trees are fitted to the last 20 acquisitions from each generator in 
the training set and allowed to vote using the random Forest function in the R statistical 
environment. The resulting misclassification rate is 0.00673. 

The forest model is then used to predict the entire training set (misclassification 
rate 0.01420) as well as the experimental set (see Results section). One drawback to the 
random forest is its “black box” nature which restricts insight into how predictions are 
made, although variable importance is obtainable. 


43 



THIS PAGE INTENTIONALLY LEET BLANK 


44 



IV. RESULTS 


In the final phase of the study, for both the training and experimental data we 
compare the status of the generators to the estimated probability of being a bad generator 
based on the logistic regression model and the classification of being bad or good based 
on the forest of trees. In addition, we track the three HI (the HI for the shaft, gear, and 
bearing) provided by IMD-HUMS. 

Only the last 20 acquisitions for each generator are used to construct the logistic 
regression and forest of trees models. As a check of these methods, probabilities of bad 
are estimated for each acquisition in the entire two-year period for which IMD-HUMS 
data is available. As an example of how we compare results, consider generator number 
43. Generator number 43 is in the training set and classified as a good generator. The 
plot of estimated probability of bad (circular dots) based on logistic regression and 
classification of being good (0) or bad (1) (triangles) based on forest of trees is given in 
Figure 12. 
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• Logit Prediction 
A Forest Prediction 



Figure 12. Generator Number 43: Plot of Estimated Probability of Bad from the 
Logistic Regression Model and the Classification of Being Good (0) or Bad (1) from the 
Forest of Trees Model for the Entire Two-Year Acquisition Period 


Notice that the estimates of the probability of being bad based on the logistic 
regression vary from acquisition to acquisition, even rising above 0.5, but for the most 
part are small with the majority of estimates below 0.1. For this generator the forest of 
trees classifies the generator as good for every acquisition. 

To see the trends in the estimated probabilities from the logistic regression more 
clearly, in Figure 13 we superimpose a smooth nonparametric fit of the estimated 
probabilities using a loess smoother (Montgomery, 2001). At each acquisition, the loess 
smoother fits a weighted regression using only the nearest neighbors to that acquisition. 
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The number of nearest neighbors used is governed by the span, or proportion of total 
number of observations in the data set. The larger the span, the more extensive the 
smoothing. For most generators, the loess fit with a span of .3 gives a smooth estimate of 
probability of bad which can in turn be used to predict the generator status. However, 
loess fits with a span of .3 for generators numbers 7, 53, and 56 are not smooth; thus 
cross-validation is used to automatically set span parameters between .3 and .5. This 
cross validation is implemented by default when using the S-Plus function. For 
consistency, all graphs of the training set generator predictions in the remainder of the 
chapter are all shown with the S-Plus "auto" span parameter. Experimental set generator 
predictions are all shown with a .3 span parameter. Figure 13 shows the loess fit for 
generator number 43. For this generator, the loess fit is a straight line at zero. Thus, the 
logistic regression results indicate that the generator should be classified as good. 


O Logit Prediction 

- Loess Logit 

A Forest Prediction 



2003 - 2005 


Figure 13. Generator Number 43: Plot of Estimated Probability of Bad from the 
Logistic Regression Model with Smoothing and the Classification of Being Good (0) or 
Bad (1) from the Eorest of Trees Model for the Entire Two-Year Acquisition Period 
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Analogous to the health indicators, we use the loess smooth of the estimated 
probabilities to indicate that a generator is good, or assign a strong, moderate, or weak 
classification to a generator that is bad. When the loess fits have values greater than .66, 
then we say that the logistic regression strongly indicates the generator as bad. When the 
loess fit has estimated fits greater than .33 but smaller than .66, we say that the logistic 
regression moderately indicates the generator as bad. A value between 0 and .33 shows a 
weak indication, and a straight loess fit line of 0 indicates good. A summary of the 
results is given in Table 9 for the training data and special cases are discussed in detail in 
this chapter. 

Table 9. Classification of Training Set Generators Based On the Logistic 
Regression Fit. (>.66 - 1.0 Strong, >.33 - .66 Moderate, >0.0 - .33 Weak, 0.0 Good) 


Prior 

Classification 

I 

^ogit Predictions 


Good 

Weak 

Moderate 

Strong 

Total 

Good 

40 

6 

0 

0 

46 

Bad 

0 

0 

1 

5 

6* 

* includes additional generator discovered during model formulation 

52 


The rule used for the results of the forest of trees is a majority of 1.0 predictions is 
a strong classification and a minority of 1.0 predictions is a moderate classification. 

A. RESULTS FOR GENERATORS IN THE TRAINING SET 

After fitting the logistic regression model and forest of trees to the last 20 
acquisitions of each generator in the training set, the models are used to predict generator 
state throughout the entire two-year period in which the training set was collected. This 
serves as additional validation of the models, as well as providing additional information 
about behavior of the faulty generators. Appendix B provides an overview, while 
subsequent subsections cover specific findings for generators in the training set. 
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1. Four Bad Generators: Numbers 22, 31, 53, 56 

Of the 52 generators in the training set five are classified as bad. Four of these 
(numbers 22, 31, 53, 56) are similar in that they have high SOI CL For all four of these, 
the generator was determined to be faulty upon inspection. Figures 12 and 13 provide an 
example of the plot for a good generator. In contrast. Figure 14 shows the corresponding 
plots for the four generators from the training set for which damage was found upon 
inspection. It is not surprising that both the logistic regression and forest of trees models 
classify generators with proven damage as bad, since these generators were used for 
model fitting and their Cl have values which form clusters separated from the values of 
the Cl from the rest of the training set(see Figure 6). In particular, these generators have 
high SOI and S02 Cl compared to the good generators in the training set. Generator 
number 53 is unusual in the amount of variation present between acquisitions, shown on 
the next page in Figure 14. There may be something different about the failure mode for 
this generator, but no clear-cut, specific cause has been identified, which accounts for this 
variation and is a phenomenon worthy of study. 
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Figure 14. Generators Numbers 22, 31, 53, 56: Plot of Estimated Probability of Bad 
from the Logistic Regression Model with Smoothing and the Classification of Being 
Good (0) or Bad (1) from the Forest of Trees Model for the Entire Two-Year Acquisition 
Period 


2. Generator Number 9 

Generator number 9 is classified as a bad generator because of an actual failure. 
During operation the helicopter did not receive electrical power from this generator 
resulting in the illumination of a generator-fail warning light. After replacing the 
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generator with a new one the problem went away. Both the logistic regression and forest 
of trees models classify the number 9 generator as bad, but not strongly (see Figure 15). 
These results are consistent with the plot in Figure 6 which shows that generator number 
9 is not easily distinguished from the good generators. The figure depicts good generator 
data points (light grey dots if viewed in the non-color copy or yellow dots if viewed in 
the color copy) and bad generator data points (light grey dots if viewed in the non-color 
copy or purple dots if viewed in the color copy) using five important Cl as variable 
inputs. The dark grey dots intermingled with the good generator data points are primarily 
from generator number 9. This raises the question: Was the generator failure merely 
electrical in nature (such as an electrical short-circuit) and not mechanical and therefore 
undetectable by the IMD-HUMS Cl? This generator may be classified as bad in the 
logistic regression and forest of trees models only because it is in the training set and was 
used to build both the logistic regression and forest of trees models. Perhaps it is 
detected in the logistic and forest of trees models due to over-fitting as a result of its 
binary bad classification. 


51 



• Logit Prediction 

- Loess Logit 

A Forest Prediction 



2003 - 2005 


Figure 15. Generator Number 9: Plot of Estimated Probability of Bad from the 
Logistic Regression Model with Smoothing and the Classification of Being Good (0) or 
Bad (1) from the Forest of Trees Model for the Entire Two-Year Acquisition Period 

For generator number 9 there are some acquisitions for which the bearing HI is in 
the warning range, but these warnings are present on many good generators. To justify 
inclusion of generator number 9 on the bad generator list, two mini-experiments are 
performed. In the first, the data is perturbed by giving a binary classification of good (0) 
to generator number 9 and fitting a new logistic regression model. Alarmingly, generator 
number 9 is then predicted to be a perfectly good generator. In this modified data the 
only bad generators are the four generators with high SOI Cl (numbers 22, 31, 53, 56). 
In the second mini-experiment, the data is then perturbed further by giving a binary 
classification of bad (1) to a perfectly good generator, generator number 26. The new 
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logistic regression model gives estimates of being bad to this good (classified bad) 
generator no higher than .3, yet now gives estimates of probabilities to generator number 
9 (still classified as good) values in the .5 to .95 range. This suggests generator number 9 
is not a case of a good generator misclassified as a bad generator. Therefore generator 
number 9 is retained as a bad generator and is an important element of the logistic and 
forest of trees model. The mode of failure of generator number 9 may be different from 
the other failure modes and unique to the data set. 

3. Generator Initially Classified as Good 

One generator initially classified as good in the training set is detected by the 
logistic regression model as being misclassified. For generator number 7 the logistic 
model gives strong estimates of being bad (values of 1.0, much stronger than generator 
number 9) and then rapidly drops off to estimates of being good (values of 0.0) around 
July 2005, see Figures 16. Figure 17 plots in EXCEL the three IMD-HUMS produced 
health indicators and depicts the dramatic change from a bad conditional state to a good 
conditional state for generator number 7. 
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Figure 16. Generator Number 7: Plot of Estimated Probability of Bad from the 
Logistic Regression Model with Smoothing and the Classification of Being Good (0) or 
Bad (1) from the Forest of Trees Model for the Entire Two-Year Acquisition Period 



Figure 17. Generator Number 7: EXCEL Plot of IMD-HUMS Produced Shaft, Gear 
and Bearing HI (note the change from bad to good HI) 
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The generator also had strong bearing HI indications which drop off at the same 
time as the logistic regression model. Based on these results Goodrich Corporation re¬ 
examined the records for generator number 7 and confirmed that the accessory gearbox, 
which houses the gear and bearing for the generator, had been replaced on the aircraft 
(Bechhoefer, 2005). The model has properly predicted a generator to be in an unhealthy 
condition, and likewise predicts the post-maintenance health as good. The pre¬ 
maintenance acquisitions of generator number 7 were then reclassified as bad and the 
logistic regression model was refitted. The null deviance increased from 658.42 to 
743.86. The residual deviance decreased slightly from 77.76 to 77.72. The final forest of 
trees model is then also refit including generator number 7 as a bad generator. 

4. Loess Smoothing and “Weak” or "Scattered" Logistic Regression 
Predictions goes under Loess 

The Cl's behavior is complex and highly variable in nature. Spikes which are not 
easily linked to a specific cause can occur; further it is difficult to determine the 
periodicity. This complex behavior can be seen in varying degrees on many generators 
and it affects HI calculations and logistic regression predictions. The forest of trees 
appears more robust to these fluctuations than the logistic regression, possibly due to its 
repetitive re-sampling and voting process. To avoid high false alarm rates, loess 
smoothing is performed on the logistic regression using S-Plus (smoothing parameter 0.3 
or auto-default for the training set, 0.3 for the experimental set). Generator logistic 
prediction results are considered bad if their loess curve ever moves above .33 with 
anything over .66 being considered a strong prediction. A “weak” prediction occurs 
when there are enough spikes to pull the loess curve above zero. A “scattered” 
classification occurs when there are one or more spikes before the loess smoothes them 
down to zero. Figure 18 shows examples of "weak" and "scattered" logistic regression 
examples. 
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Example of “Scattered t^xample of “Weak” 




Figure 18. Generators Numbers 2 and 18: Examples of "Scattered" and "Weak" Logit 
Plots of Estimated Probability of Bad from the Logistic Regression Model with 
Smoothing and the Classification of Being Good (0) or Bad (1) from the Eorest of Trees 
Model for the Entire Two-Year Acquisition Period 


5. Good Generators 

There are only three classified good generators with any bad (1) forest of trees 
predictions (numbers 10, 39, 65). These few bad predictions are sporadic and each time 
they are accompanied by weak or scattered logit predictions as depicted in Eigure 19. 
However, with the loess smoother applied the logistic regression model classifies these 
three generators strictly as good. 
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Figure 19. Generator Number 39: Example of Sporadic Forest of Trees Predictions, 
Plot of Estimated Probability of Bad from the Logistic Regression Model with Smoothing 
and the Classification of Being Good (0) or Bad (1) from the Forest of Trees Model for 
the Entire Two-Year Acquisition Period 


B. RESULTS FOR GENERATORS IN THE EXPERIMENTAL SET 

With generator number 7 reclassified as bad prior to its maintenance and with 
both models refit with this reclassification the logistic regression and forest of trees 
models are applied to the experimental set. A summary of the logit results is given in 
Table 10. 
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Table 10. Classification of Experimental Set Generators Based On the Logistic 
Regression Fit. (>.66 - 1.0 Strong, >.33 - .66 Moderate, >0.0 - .33 Weak, 0.0 Good) 


Prior 

Classification 

I 

^ogit Predictions 


Good 

Weak 

Moderate 

Strong 

Total 

Good 

4 

0 

0 

1 

5 

Bad 

1* 

0 

0 

0 

1 

Watch List 

2 

2 

1 

1 

6 

*generator #33 removed due to generator caution light 
no IMD-HUMS indications or model predictions of being bad 

12 


1. Classified Generators 

The lone experimental generator classified as bad, number 33, taken off the 
aircraft due to a generator caution light has no logistic regression or forest of trees 
predictions of bad condition as well as no HI warnings (Appendix C). Thus evidence 
points to the cause of failure to be strictly electrical, such as a short-circuit. 

Of the five generators classified as good (numbers 15, 40, 58, 64, 66) in the 
experimental data set, four show no bad predictions made by either the logistic regression 
or forest of trees models. Generator number 15 shows a highly unusual and fairly strong 
logistic regression result comparable to generator number 30 of the experimental data set 
and generator number 9 of the training set. However, those generators also show bad 
predictions with forest of trees and at least some HI warnings. Generator number 15 has 
no bad predictions from the forest of trees model. (Figure 20). 
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Figure 20. Generator Number 15 (Aircraft 9326493, Left): Plot of Estimated 
Probability of Bad from the Logistic Regression Model with Smoothing and the 
Classification of Being Good (0) or Bad (1) from the Forest of Trees Model for the Entire 
Two-Year Acquisition Period 

A request was sent to the users for additional information concerning the current 
state of this generator and whether any maintenance had been performed. A detailed 
inspection of maintenance records indicates that indeed the generator had been replaced 
during a major maintenance reset in October 2004. This coincides directly with the drop 
from strong to weak logit prediction. The logit model has again properly identified a 
generator in bad condition. 
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2. Unclassified "Watch List" Generators 

The logistic regression and forest of trees models are then used to predict the 
status of the generators of questionable status (watch list) in the experimental data set. 
The summary table and a complete set of result graphs are given in Appendix C. 

Generator Number 30, which has shaft HI alarms, is predicted as bad fairly 
strongly by both the logistic regression and forest of trees models. This generator is 
unusual due to the sharp increase in both the predicted probability of bad and the number 
of instances of bad classification that occurs while shifting into alarm status (Figure 21). 
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Figure 21. Generator Number 30 (Aircraft 9426545, Left): Plot of Estimated 
Probability of Bad from the Logistic Regression Model with Smoothing and the 
Classification of Being Good (0) or Bad (1) from the Forest of Trees Model for the Entire 
Two-Year Acquisition Period 
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Generators Numbers 21 and 48 have shaft HI alarms predicted fairly strongly with 
the forest of trees model and weakly with the loess smoothed logistic regression model 
(Figure 22). The high variability of these generators keeps the loess curve from climbing, 
but such high variance can be a symptom of impending failure. Therefore the subjective 
assessment is made that these are indeed bad generators. 
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Figure 22. Generator Numbers 21 and 48: Plot of Estimated Probability of Bad from 
the Logistic Regression Model with Smoothing and the Classification of Being Good (0) 
or Bad (1) from the Forest of Trees Model for the Entire Two-Year Acquisition Period 


61 
















Generator number 55 had medium predictions from both the logistic regression 
and the forest of trees models (Figure 23). Interestingly, the logistic regression model 
shows an improvement in the generator's state while the forest of trees model predicts a 
bad state only sporadically toward the latter portion of the acquisitions. This shows that 
the models indeed function differently, even though they tend to agree with each other. 
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2004 - 2005 


Figure 23. Generator Number 55: Plot of Estimated Probability of Bad from the 
Logistic Regression Model with Smoothing and the Classification of Being Good (0) or 
Bad (1) from the Forest of Trees Model for the Entire Two-Year Acquisition Period 
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Generator number 6 and generator 24 are on the watch list but are not predicted 
bad by the logistic regression or forest of trees models. Notably, generator number 24 
does have some sporadic logit predictions as seen in Figure 24. The subjective 
assessment is that they are not bad enough to warrant replacement. 


1.0 

0.8 

0.6 

0.4 

0.2 

0.0 


• Logit Prediction 

- Loess Logit 

A Forest Prediction 

GEN: 24 ~ 


r ''' I''' I''' T ''' r''' r '' r ''' 1 . 1111 i|i 11111111111 i|i 11111111111 i|i i 111.1111 i,i 111 

Jan Mar May Jul Sep Nov Jan Mar May Jul 

2004 - 2005 


Figure 24. Generator Number 24: Plot of Estimated Probability of Bad from the 
Logistic Regression Model with Smoothing and the Classification of Being Good (0) or 
Bad (1) from the Forest of Trees Model for the Entire Two-Year Acquisition Period 
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V. CONCLUSION 


This thesis demonstrates that a logistic regression model which predicts the 
overall state of a UH-60L electrical generator can be fit using IMD-HUMS data collected 
with known cases of failed generators and properly operating generators. Generator status 
serves as the dependent, binary response variable. The independent predictor variables 
can be chosen using correlation with the dependent variable, backwards elimination using 
p-values, and classification rates. The model is refined by incorporating new failures as 
they occur into the data set and refitting to produce a more sensitive and accurate 
prediction model. This results in an accurate picture of a "bad" generator and generators 
susceptible to failure. 

A random forest of trees was also created as a nonparametric augmentation to the 
prediction effort. It serves to quickly and automatically sample combinations of the 
predictors, aggregating votes in order to make accurate predictions which are fairly 
robust to false alarms. A single classification or regression tree can be created as a 
parallel effort in understanding the important predictors, helping during variable selection 
for a logistic regression model. 

A. APPLICATIONS 

Due to the highly variable nature of the predictor values, this model has lower 
success predicting states with just one acquisition. In addition, this type of model may 
not be able to predict failures of types not included in the model building. As data is 
accrued, these previously unobserved failure modes should become increasingly rare. No 
effort is recommended to supplant any current algorithms currently on board the aircraft. 
Its greatest value may be in the picture it creates of how an at-risk mechanical component 
behaves. This technique is easily transferable to other components on the helicopter as 
well as to other, completely different, platforms. The beauty of these models and the 
process of deriving them is that the relatively accurate state pictures they produce are 
attained with minimal effort, time and expense. Requirements are only an understanding 
of the system and data set, off-the-shelf statistical software and a computer. 
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Concurrent with data collection, the development of component prediction models 
of importance, for example for transmissions or engines, could be initiated. The selection 
of pertinent Cl predictors should start using not only an understanding of the system's 
mechanics and vibrations but also the incorporation of parametric and nonparametric 
statistical approaches. As more data, including component failures, is collected the 
models are refined. The use of Ggobi in detecting different failure modes is a particularly 
simple and quick way to investigate the IMD-HUMS data. These real-data based models, 
which are easily derived, are pertinent in the move toward Condition Based Maintenance. 

For instance, periodically a serious defect is found on one or more single-type 
aircraft resulting in a grounding of the fleet. ASAM, SoFM and IRAC messages dictate 
specific inspections or corrective maintenance actions which must be accomplished on 
each aircraft prior to the resumption of flight operations. The time required to fulfill the 
requirements of these messages severely impacts both real-world and training operations. 
In the move toward CBM, this dual logistic regression and forest of trees process could 
be used to focus initial inspection efforts on only those aircraft whose “picture” 
resembles the problem aircraft. The other aircraft could continue operations and get 
inspected at the next convenient maintenance period. 

Another practical application of this process is to reduce data collection 
requirements of the onboard system. Important predictor variables which continually 
show up in logistic regression and forest of trees models would be retained while 
variables which never show importance become candidates for removal. This would free 
up valuable memory space in the onboard system. 

B. RECOMMENDATIONS FOR FURTHER STUDY 

Aspects critical to the development of better component health prediction models 
are the incorporation of variance within the multiple Cl, concise variable selection, and 
time-series trends. 

It is known that an increase in Cl variability overtime is an indication of 
deteriorating component health, but the thresholds between normality and abnormality of 
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variance for the many Cl has not yet been determined. The large data sets now being 
produced by IMD-HUMS can be used to estimate the variance of the CL 

Further analysis of variable selection in component health prediction models is 
also worthy of more attention. If the number of Cl can be definitively limited to a few 
very effective predictors the "curse of multi-dimensionality" can be eliminated and 
component health distributions can be estimated accurately. 

The multiple acquisitions over time for each Cl can be used for trend analysis. 
Rates of change in the Cl values incorporated in the prediction models could ultimately 
be used in accurately estimating available component lifetime. The loess smoothing used 
in the logistic regression model serves as a primitive attempt to account for trends. 
However, the data provides potential to use time series information in a much more 
effective manner. Further study of the time series relationships may illuminate factors 
which cause the seemingly random oscillations in CL 

The further study of variability and trends could help in addressing the great deal 
of noise present in the data. Random data spikes complicate the setting of thresholds and 
the development of accurate, real-time state prediction algorithms. In the logistic 
regression model, this created the need for loess smoothing. While the random forest of 
trees is more robust to false alarms caused by certain spikes, the Type II error rates are 
not known and the model may be too insensitive. 

Ideally, the best of models would determine from a single acquisition a 
component's state and the remaining lifetime of use. The development of such models 
require further study in understanding the distribution of failures for each component, 
variability within and among Cl, and trending of Cl over time. 
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APPENDIX A IMD-HUMS SHAFT, GEAR AND BEARING Cl 


Each IMD-HUMS acquisition concerning the shaft, spur gear and bearings of a 
generator results in the reading of the 169 variables listed here. The response variables 
and the generator number are added for this study. The subset of 65 potential predictors 
remaining after initial variable elimination are highlighted in grey. 


RESPONSE 

status 

status binary 

SHAFT Cl 

Component Name- shaft 

Date_Time 

ah.Tail 

GEN Number 

Torque 

Airspeed 

Main Rotor Speed 
OAT 

MGBTEMP 

Regime 

Opidx 

OpRTR 

OpNPH 

RTRUSG 

NPH 

Health 

PriRAW 

SecRAW 

COMP 

SENS 

Engl Torque 
Eng2Torque 
DQ 

XAXIS 

Shaft Order 1 (IPS) 

Shaft Order 2 (IPS) 


Shaft Order 3 (IPS) 

Half Shaft Order (IPS) 

Shaft Order 1 (OBS) 

Shaft Order 2 (OBS) 

Shaft Order 3 (OBS) 
RecomputedHealthIndicator 
Shaft Order 1 (g) 

Shaft Order 2 (g) 

Shaft Order 3 (g) 

Half Shaft Order (g) 

Half Shaft Order (OBS) 

Sig Avg Peak to Peak 
Sig Avg RMS 
Health Indicator 
Sig Avg Crest Eactor 
Sig Avg Skewness 
Sig Avg Kurtosis 
Sig Avg Eifth Moment 
Sig Avg Sixth Moment 
Residual Peak to Peak 
Residual RMS 
Residual Crest Eactor 
Residual Skewness 
Residual Kurtosis 
Residual Eifth Moment 
Residual Sixth Moment 
Sig Avg El 
EO Peak to Peak 
EORMS 
EO Crest Eactor 
EO Skewness 
EO Kurtosis 
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EO Fifth Moment 
EO Sixth Moment 
Gear Distributed Fault 
Resample Rate 

MeasuredShaft Speed Phase Kurtosis 
EOEl 

Total Torque 
Airspeed 

Main Rotor Speed 
Engine 1 GasTurbineSpeed 
Engine 1 PowerT urbineSpeed 
Engine 1 Torque 
Engine2GasTurbineSpeed 
Engine2PowerT urbineSpeed 
Engine2Torque 

GEAR Cl 

Date_Time 

Tail 

Name-gear 

Health 

PriRAW 

SecRAW 

COMP 

SENS 

DQ 

XAXIS 

Residual Kurtosis 
Residual RMS 
Sideband Mod 1 
Narrowband CrestFactor 
Gear Distributed Fault 
G2-1 

Residual Peak to Peak 
RecomputedHealthIndicator 
Sig Avg Peak to Peak 
Sig Avg Kurtosis 
Sig Avg RMS 
Residual Skewness 
Residual Crest Factor 
Residual Fifth Moment 
Residual Sixth Moment 
Gear Misalignment 1 
Sideband Mod 2 
sm_3 AS Sideband Mod 3 


Health Indicator 
Gear Misalignment 2 
Gear Misalignment 3 
Narrowband RMS 
Narrowband Peak to Peak 
Narrowband Skewness 
Narrowband Kurtosis 
Narrowband FifthMoment 
Narrowband Sixth Moment 
Instantaneous Frequency 
CSM 

AM Kurtosis 
Derivative AM Kurtosis 
FM Kurtosis 
Derivative FM Kurtosis 
FM Peak to Peak 
G2-2 
G2-3 

BEARING Cl 

Date.Time 

ah.Tail 

BearingName 

BearingPart 

Health 

PriRAW 

SecRAW 

COMP 

brg. Priority 

DQ 

XAXIS 

Ball Energy (Norm) 

Cage Energy (Norm) 

Inner Race Energy (Norm) 
Outer Race Energy (Norm) 
Bearing Energy 15k-20k 
Total Bearing Energy (Norm) 
Envelope RMS 
Recomputed Health Indicator 
Ball Energy 
Cage Energy 
Inner Race Energy 
Outer Race Energy 
Total Bearing Energy 
Envelope Peak to Peak 
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Envelope Crest Factor 
Envelope Skewness 
Envelope Kurtosis 
Envelope Fifth Moment 
Envelope Sixth Moment 
Health Indicator 
Envelope Distributed Fault 
Tone Energy 
Base Energy 
Ball Mod Cage 
Inner Race Mod Ball 
Inner Race Mod Cage 
Inner Race Mod Outer 


Outer Race Mod Ball 
Outer Race Mod Cage 
TotalBearingCoupling Energy 
Ball Mod Shaft 
Cage Mod Shaft 
Inner Race Mod Shaft 
Outer Race Mod Shaft 
TotalShaft-Bearing Coupling 
Ball Spin Ratio 
Cage Ratio 
Inner Race Ratio 
Outer Race Ratio 
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APPENDIX B TRAINING SET RESULTS SUMMARY 


Appendices B and C give the complete logistic regression and Random Forest of 
Classification Trees results. Appendices D and E give a complete set of IMD-HUMS HI 
for comparison. Note the change in the method of shaft HI computation around October 
2004. 


status binary 

Bd 

denotes generators with proven faults (bad) 

Gd 

denotes generators without proven faults (good) 


HI: Health Indication provided by IMD-HUMS 

on-board algorithms 

s 

shaft warning (SS denotes alarm status) 

G 

gear warning 

B 

bearing warning (BB denotes alarm status) 


Logit 

strong 

loess smoothed values over 0.66 

moderate 

loess smoothed values over 0.33 

weak 

loess smoothed values between 0 and 0.33 

scattered 

logit spikes of 1.0 that do not pull loess curve above 0 


Forest 

strong 

majority of classifications are 1.0 (bad) 

moderate 

minority of classifications are 1.0 (bad) 
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Helicopter 

Tail 

Number 

Generator 

Side 

Status 

Generator 

Number 


HI 

S,G,B 

Logit 

Forest 

9126351 

Left 

Gd 

1 


G 



9226432 

Left 

Gd 

2 


BB 



9226435 

Left 

Gd 

3 





9226438 

Left 

Gd 

4 





9226439 

Left 

Gd 

5 





9226443 

Left 

Gd,Bd 

7 


G,B 

strong 

strong 

9226446 

Left 

Gd 

8 





9226450 

Left 

Gd 

10 


B 

weak 

moderate 

9226453 

Left 

Gd 

11 


S 



9226455 

Left 

Gd 

12 


G,B 



9326477 

Left 

Gd 

13 





9326485 

Left 

Gd 

14 


G 



9326500 

Left 

Gd 

16 





9326506 

Left 

Gd 

17 


B 



9326507 

Left 

Gd 

18 


S,B 

weak 


9326509 

Left 

Gd 

19 



weak 


9326515 

Left 

Gd 

20 





9326524 

Left 

Gd 

25 


B 



9326530 

Left 

Gd 

26 





9426533 

Left 

Gd 

27 


B 



9426534 

Left 

Gd 

28 


S,G,B 

weak 


9426537 

Left 

Gd 

29 


G,B 

weak 


9426549 

Left 

Gd 

32 


G 



9126351 

Right 

Gd 

35 





9226432 

Right 

Gd 

36 





9226435 

Right 

Gd 

37 


S 
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Helicopter 

Tail 

Number 

Generator 

Side 

Status 

Generator 

Number 


HI 

S,G,B 

Logit 

Forest 

9226438 

Right 

Gd 

38 





9226439 

Right 

Gd 

39 


SS,G,B 

scattered 

moderate 

9226443 

Right 

Gd 

41 


G,B 

scattered 


9226446 

Right 

Gd 

42 





9226450 

Right 

Gd 

43 


G,BB 



9226453 

Right 

Gd 

44 


B 



9226455 

Right 

Gd 

45 


G,B 



9326477 

Right 

Gd 

46 





9326485 

Right 

Gd 

47 


G,BB 



9326500 

Right 

Gd 

49 


S,G 



9326506 

Right 

Gd 

50 


G 



9326507 

Right 

Gd 

51 


B 



9326509 

Right 

Gd 

52 


G 



9326515 

Right 

Gd 

54 


G 



9326518 

Right 

Gd 

57 





9326524 

Right 

Gd 

59 


S,G,B 



9326530 

Right 

Gd 

60 


G 



9426533 

Right 

Gd 

61 


G 



9426534 

Right 

Gd 

62 



weak 


9426537 

Right 

Gd 

63 


S,G 



9426549 

Right 

Gd 

65 


SS,G 

scattered 

moderate 

9226450 

Left 

Bd 

9 


B 

moderate 

moderate 

9326518 

Left 

Bd 

22 


SS 

strong 

strong 

9426549 

Left 

Bd 

31 


SS 
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APPENDIX C EXPERIMENTAL SET RESULTS SUMMARY 
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APPENDIX D 


TRAINING SET HI 
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APPENDIX E EXPERIMENTAL SET HI 
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APPENDIX F TRAINING SET CLASSIFICATION TREE 


*** Tree Model *** 

Classification tree: 

tree(formula = status ~ Shaft.Order.1..IPS. + Shaft.Order.2..IPS. + 

Shaft.Order.3..IPS. + Half.Shaft.Order..IPS. + Gear.Distributed.Fault + 
Residual.Kurtosis + Residual.RMS + Sideband.Mod.1 + 

Narrowband.CrestFactor + G2.1 + Residual.Peak.to.Peak + 

Sig.Avg.Peak.to.Peak + Sig.Avg.Kurtosis + Sig.Avg.RMS + 

Residual.Skewness + Residual.Crest.Factor + Residual.Fifth.Moment + 
Residual.Sixth.Moment + Gear.Misalignment.1 + sm.3.AS.Sideband.Mod.3 + 
Gear.Misalignment.2 + Gear.Misalignment.3 + Narrowband.RMS + 

Narrowband.Peak.to.Peak + Narrowband.Skewness + Narrowband.Kurtosis + 
Narrowband.FifthMoment + Narrowband.Sixth.Moment + 

Instantaneous.Frequency + CSM + AM.Kurtosis + Derivative.AM.Kurtosis + 
FM.Kurtosis + Derivative.FM.Kurtosis + FM.Peak.to.Peak + G2.2 + G2.3 + 
Bearing.Energy.15k.20k + Envelope.RMS + Ball.Energy + Cage.Energy + 
Inner.Race.Energy + Outer.Race.Energy + Total.Bearing.Energy + 

Envelope.Peak.to.Peak + Envelope.Crest.Factor + Envelope.Skewness + 
Envelope.Kurtosis + Envelope.Fifth.Moment + Envelope.Sixth.Moment + 
Envelope.Distributed.Fault + Tone.Energy + Base.Energy + 

Ball.Mod.Cage. + Inner.Race.Mod.Ball + Inner.Race.Mod.Cage + 

Inner.Race.Mod.Outer + Outer.Race.Mod.Ball + Outer.Race.Mod.Cage + 

Total.Bearing.Coupling.Energy + Ball.Mod.Shaft + Cage.Mod.Shaft. + 

Inner.Race.Mod.Shaft + Outer.Race.Mod.Shaft + 

Total.Shaft.Bearing.Coupling, data = CGDNtrainingCUT.65, na.action = 
na.exclude, mincut = 5, minsize = 10, mindev = 0.01) 

Variables actually used in tree construction: 

[1] "Shaft.Order.1..IPS." "G2.1" "Base.Energy" 

[4] "Gear.Misalignment.3" "G2.3" "Half.Shaft.Order..IPS." 

Number of terminal nodes: 7 

Residual mean deviance: 0.01577 = 16.29 / 1033 
Misclassification error rate: 0.004808 = 5 / 1040 
node), split, n, deviance, yval, (yprob) 

* denotes terminal node 

1) root 1040 658.400 G ( 0.09615 0.90380 ) 

2) Shaft.Order.1..IPS.<1.72485 970 281.300 G ( 0.03299 0.96700 ) 

4) G2.1<38.5724 200 175.900 G ( 0.16000 0.84000 ) 

8) Base.Energy<0.655714 154 0.000 G ( 0.00000 1.00000 ) * 

9) Base.Energy>0.655714 46 56.530 B ( 0.69570 0.30430 ) 

18) Gear.Misalignment.3<-41.7041 11 0.000 G ( 0.00000 1.00000 ) * 

19) Gear.Misalignment.3>-41.7041 35 20.480 B ( 0.91430 0.08571 ) 

38) G2.3<65.3999 7 9.561 B ( 0.57140 0.42860 ) * 

39) G2.3>65.3999 28 0.000 B ( 1.00000 0.00000 ) * 

5) G2.1>38.5724 770 0.000 G ( 0.00000 1.00000 ) * 

3) Shaft.Order.1..IPS.>1.72485 70 18.160 B ( 0.97140 0.02857 ) 

6) Half.Shaft.Order..IPS.<0.284577 65 0.000 B ( I.00000 0.00000 ) * 

7) Half.Shaft.Order..IPS.>0.284577 5 6.730 B ( 0.60000 0.40000 ) * 
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